RNA-seq RNAaccess identified as the preferred method for gene expression analysis of low quality FFPE samples

Clinical tumor tissues that are preserved as formalin-fixed paraffin-embedded (FFPE) samples result in extensive cross-linking, fragmentation, and chemical modification of RNA, posing significant challenges for RNA-seq-based gene expression profiling. This study sought to define an optimal RNA-seq protocol for FFPE samples. We employed a common RNA extraction method and then compared RNA-seq library preparation protocols including RNAaccess, RiboZero and PolyA in terms of sequencing quality and concordance of gene expression using FFPE and case-matched fresh-frozen (FF) triple-negative breast cancer (TNBC) tissues. We found that RNAaccess, a method based on exome capture, produced the most concordant results. Applying RNAaccess to FFPE gastric cancer tissues, we established a minimum RNA DV200 requirement of 10% and a RNA input amount of 10ng that generated highly reproducible gene expression data. Lastly, we demonstrated that RNAaccess and NanoString platforms produced highly concordant expression profiles from FFPE samples for shared genes; however, RNA-seq may be preferred for clinical biomarker discovery work because of the broader coverage of the transcriptome. Taken together, these results support the selection of RNA-seq RNAaccess method for gene expression profiling of FFPE samples. The minimum requirements for RNA quality and input established here may allow for inclusion of clinical FFPE samples of sub-optimal quality in gene expression analyses and ultimately increasing the statistical power of such analyses.

Library Preparation using Illumina Stranded Total RNA Ribo-Zero with Ribo-Zero Gold (RiboZero, a ribosomal RNA depletion method) Cluster generation and sequencing of libraries is performed on the Illumina HiSeq.RNA samples are converted into cDNA libraries using the Illumina TruSeq Stranded Total RNA sample preparation kit.Briefly, Total RNA samples are concentration normalized, and ribosomal RNA (rRNA) is removed using biotinylated probes that selectively bind rRNA species.This process preserves mRNA and other non-coding RNA species including lincRNA, snRNA and snoRNAs.The resulting rRNA-depleted RNA is fragmented using heat in the presence of divalent cations, with fragmentation times varying based on input RNA degradation.Fragmented RNA is converted into double-stranded cDNA, with dUTP utilized in place of dTTP in the second strand master mix.A single 'A' base is added to the cDNA and forked adaptors that include index, or barcode, sequences are attached via ligation.The resulting molecules are amplified via polymerase chain reaction (PCR).During PCR the polymerase stalls when a dUTP base is encountered in the template.Since only the second strand includes the dUTP base, this renders the first strand the only viable template, thereby preserving the strand information.Final libraries are quantified, normalized and pooled.Pooled libraries are bound to the surface of a flow cell and each bound template molecule is clonally amplified up to 1000-fold to create individual clusters.Four fluorescently labeled nucleotides are then flowed over the surface of the flow cell and incorporated into each nucleic acid chain.Each nucleotide label acts as a terminator for polymerization, thereby ensuring that a single base is added to each nascent chain during each cycle.Fluorescence is measured for each cluster during each cycle to identify the base that was added to each cluster.The dye is then enzymatically removed to allow incorporation of the next nucleotide during the next cycle.
Library Preparation using Illumina TruSeq RNA access method (RNAaccess, now TruSeq Exome, an exome-capture method) Cluster generation and sequencing of libraries is performed on the Illumina HiSeq.The TruSeq RNA access method is a hybridization-based assay to enrich for coding RNAs from total RNA sequencing libraries.The assay consists of two major steps: 1) total RNA library preparation, and 2) coding RNA enrichment.
Preparation of total RNA library: First strand cDNA synthesis is primed from total RNA using random primers, followed by the generation of second strand cDNA with dUTP utilized in place of dTTP in the master mix.This facilitates the preservation of strand information, as amplification in subsequent steps stalls when it encounters Uracil in the nucleotide strand.Double stranded cDNA undergoes end-repair, A-tailing, and ligation of adapters that include index sequences.The resulting molecules are amplified via polymerase chain reaction (PCR), their yield and size distribution is determined, and their concentrations are normalized in preparation for the enrichment step.
Enrichment for coding RNA: Libraries are enriched for the mRNA fraction by positive selection using a cocktail of biotinylated oligos corresponding to coding regions of the genome.Targeted library molecules are then captured via the hybridized biotinylated oligo probe using streptavidin-conjugated beads.After two rounds of hybridization/capture reactions, the enriched library molecules are subjected to a second round of PCR amplification prior to sequencing on the Illumina HiSeq.

Expression-based gene signature (TNBC set)
Applicability of signature: It should be noted that some of the signatures were originally developed on specific patient subgroups.The 76-gene signature has only been validated in nodenegative disease [1,2], but we found that it was also a valid predictor on node-positive disease [3].While RS [4] has only been applied to ER-positive breast cancer, for completion we have included this signature along with the other signatures in the analyses on the triple negative disease.
Computing original gene signature scores: Risk scores were generated using the original algorithms of the signatures.For PAM50, subtype classification was performed based on the nearest of the five centroids (distances calculated using correlation to the centroids).For 70-gene signature, the risk score for each sample was the negated correlation coefficient towards the good prognosis centroid based on 70-gene expression profile.A pseudo Oncotype DX ® Recurrence Score per patient was computed by the unscaled Recurrence Score [4].For 76-gene signature Veridex, a relapse score was calculated per studied sample using sum of the weighted log2-geneexpression of 16 ER-markers in the signature.A hypoxia score was computed for a patient by averaging expression levels for the hypoxia response genes [5,6].